EC 320, Set 03
Spring 2024
PS01:
Koans K01, K02, and K03:
Reading: (up to this point)
ItE: R, 1
So far we’ve identified the fundamental problem econometricians face. How do we proceed? Regressions!
Modeling is about reducing something really complicated into something simple that represents some part of the complicated reality.
Economists often rely on linear regression for statistical comparisons.
Regression analysis helps us make all else equal comparisons
Running regressions provide correlative (and even causal) information between two variables
Ex. By how much does \(Y\) change when \(X\) increases by one unit?
Modelling forces us to be explicit about the potential sources of bias
Ex. Not controlling for confounding variables, leads to omitted-variable bias, a close cousin of bias
Research Question: By how much does an additional year of schooling increase future earnings?
Q. How might education increase earnings?
Q. Why might a simple comparison between high and low educated not isolate the economic returns to education?
More education (X) increases lifetime earnings (Y)
More education (X) increases lifetime earnings (Y) along with a lot of other things (U).
More education (X) increases lifetime earnings (Y) along with a lot of other things (U). But a lot of other things (U) also impact education (X).
Any unobserved variable that connects a backdoor path between education (X) and earnings (Y) is called a confounder
How might we estimate the causal effect of an additional year of schooling on earnings?
Approach 1: Compare average earnings of individuals who have 12 years of education to those with 16
Approach 2: Estimate a regression that compares the earnings of individuals with the same profiles
But before taking on confounders and using regression to link causal relationships… let’s breakdown the anatomy of the simple regression model
We can estimate the effect of \(X\) on \(Y\) by estimating a regression model:
\[Y_i = \beta_0 + \beta_1 X_i + u_i\]
\(u_i\) is quite special
Consider the data generating process of variable \(Y_i\),
Some error will exist in all models, our aim is to minimize error under a set of constraints
Five items contribute to the existence of the disturbance term:
1. Omission of independent variables
Five items contribute to the existence of the disturbance term:
1. Omission of independent variables
2. Aggregation of Variables
Five items contribute to the existence of the disturbance term:
1. Omission of independent variables
2. Aggregation of Variables
3. Model misspecificiation
Five items contribute to the existence of the disturbance term:
1. Omission of independent variables
2. Aggregation of Variables
3. Model misspecificiation
4. Functional misspecificiation
Five items contribute to the existence of the disturbance term:
1. Omission of independent variables
2. Aggregation of Variables
3. Model misspecificiation
4. Functional misspecificiation
5. Measurement error
Five items contribute to the existence of the disturbance term:
1. Omission of independent variables
2. Aggregation of Variables
3. Model misspecificiation
4. Functional misspecificiation
5. Measurement error
Using an estimator with data on \(X_i\) and \(Y_i\), we can estimate a fitted regression line:
\[ \hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1 X_i \]
This procedure produces misses, known as residuals, \(Y_i - \hat{Y_i}\)
I think it would be easier to think about regression with a concrete example.
Does the number of on-campus police officers affect campus crime rates? If so, by how much?
Always plot your data first
The scatter plot suggest that a weak positive relationship exists
But correlation does not imply causation
Lets estimate a statistical model
We express the relationship between a dependent variable and an independent variable as linear:
\[ {\color{#81A1C1} \text{Crime}_i} = \beta_1 + \beta_2 {\color{#B48EAD} \text{Police}_i} + u_i. \]
\(\beta_1\) is the intercept or constant.
\(\beta_2\) is the slope coefficient.
\(u_i\) is an error term or disturbance term.
The intercept tells us the expected value of \(\text{Crime}_i\) when \(\text{Police}_i = 0\).
\[ \text{Crime}_i = {\color{#BF616A} \beta_1} + \beta_2\text{Police}_i + u_i \]
Usually not the focus of an analysis.
The slope coefficient tells us the expected change in \(\text{Crime}_i\) when \(\text{Police}_i\) increases by one.
\[ \text{Crime}_i = \beta_1 + {\color{#BF616A} \beta_2} \text{Police}_i + u_i \]
“A one-unit increase in \(\text{Police}_i\) is associated with a \(\color{#BF616A}{\beta_2}\)-unit increase in \(\text{Crime}_i\).”
Interpretation of this parameter is crucial
Under certain (strong) assumptions1, \(\color{#BF616A}{\beta_2}\) is the effect of \(X_i\) on \(Y_i\).
The error term reminds us that \(\text{Police}_i\) does not perfectly explain \(Y_i\).
\[ \text{Crime}_i = \beta_1 + \beta_2\text{Police}_i + {\color{#BF616A} u_i} \]
Represents all other factors that explain \(\text{Crime}_i\).
How might we apply the simple linear regression model to our question about the effect of on-campus police on campus crime?
\[ \text{Crime}_i = \beta_1 + \beta_2\text{Police}_i + u_i. \]
How might we apply the simple linear regression model to our question?
\[ \text{Crime}_i = \beta_1 + \beta_2\text{Police}_i + u_i \]
\(\beta_1\) and \(\beta_2\) are the unobserved population parameters we want
We estimate
\(\hat{\beta_1}\) and \(\hat{\beta_2}\) generate predictions of \(\text{Crime}_i\) called \(\widehat{\text{Crime}_i}\).
We call the predictions of the dependent variable fitted values.
So, the question becomes, how do I pick \(\hat{\beta_1}\) and \(\hat{\beta_2}\)
Let’s take some guesses: \({\color{#ffffff} \hat{\beta_1} = 60}\)
Let’s take some guesses: \(\hat{\beta_1} = 60\) and \(\hat{\beta_2}=-7\)
Does this line represent the data well?
Let’s take some guesses: \(\hat{\beta_1} = 30\) and \(\hat{\beta_2}=0\)
What about this one?
Let’s take some guesses: \(\hat{\beta_1} = 15.6\) and \(\hat{\beta_2}=7.94\)
Or this one?
Using \(\hat{\beta_1}\) and \(\hat{\beta_2}\) to make \(\hat{Y_i}\) generates misses.
We call these misses residuals:
\[ {\color{#BF616A} \hat{u}_i} = {\color{#BF616A}Y_i - \hat{Y_i}}. \]
AKA \({\color{#BF616A}e_i}\).
\(\hat{\beta_1} = 15.4\) and \(\hat{\beta_2}=7.94\)
Does this line represent the data well?
What is we picked an estimator that minimizes the residuals?
Why not minimize
\[ \sum_{i=1}^n \hat{u}_i^2 \]
so that the estimator makes fewer big misses?
This estimator, the residual sum of squares (RSS), is convenient because squared numbers are never negative so we can minimize an absolute sum of the residuals
RSS gives bigger penalties to bigger residuals.
RSS gives bigger penalties to bigger residuals.
RSS gives bigger penalties to bigger residuals.
We could test thousands of guesses of \(\beta_1\) and \(\beta_2\) and pick the pair the has the smallest RSS
What if we did that. Let’s write a loop that guesses \(\beta_1\) and \(\beta_2\) 100,000 times and collect the RSS for each guess.
Then we plot it in three dimensions: \(\beta_1\) (x), \(\beta_2\) (y), RSS (z)
Simulated RSS across \(\beta_1, \beta_2 \sim \text{uniform}(-100, 100)\)